## Warning in !is.null(rmarkdown::metadata$output) && rmarkdown::metadata$output
## %in% : 'length(x) = 2 > 1' in coercion to 'logical(1)'
In the subject MSB104 econometrics, this year we will hand in an assignment divided into four assignments throughout the semester. The assignments must be written and calculated in the software R. We are group one and the countries that will be representing in our assignment is: Denmark, France, Hungary, Portugal and Slovakia.
In this assignment we downloaded two dataset from Eurostat. The data contains GDP (nama_10r_3gdp) and population (demo_r_pjanaggr3,pi) for countries over the last 20 years, on a NUTS3 level. When we have all the information we need from the dataset, we are going to calculate the GDP per capita and describe the data by using the meta data description from Eurostat.
In the second part of this assignment we will use our data to calculate the population watertight GDP Ginie coefficients for the European NUTS2 (j) level and describe our new data. Then we are going to plot the distribution of Ginie coefficients In the end of the first assignment we will discuss if there are noteworthy outliers.
There are different types of GDP values and the unit is stored in a column named “UNIT”, we have chosen to use values where unit is MIO_EUR. This unit represents the GDP value in million Euros.
We made a new dataset called gdppop. In this dataset we will gather the information we need from the other two datasets and combine them into one dataset. This will do it easier in our ongoing research.
| Variable | N | Mean | Std. Dev. | Min | Pctl. 25 | Pctl. 75 | Max |
|---|---|---|---|---|---|---|---|
| Year | 3465 | 2010 | 6.056 | 2000 | 2005 | 2015 | 2020 |
| GDP | 3465 | 15488.351 | 22719.905 | 557 | 3816.47 | 18363.95 | 251623.58 |
| unit | 3465 | ||||||
| … MIO_EUR | 3465 | 100% | |||||
| Population | 3327 | 585736.518 | 477586.457 | 39583 | 261285.5 | 709348.5 | 2863272 |
To calculate GDP per capita, we took GDP and divided it by the population. And then we have multiplied by one million, so that it is represented correctly in Euros.
Briefly about what we found in the summary. We found the min and the max values. i.e. the smallest and the highest GDP per capita for the country. We also found 1st quartile and 3rd quartile. The first quartile was the observation between the median and the lowest value, and looks at the 25% lowest values from the 75% highest. The median looks at the value that is observed the most times in the middle of the observations. The third quartile is then, naturally enough, the value between the median and the highest value.We have also look at mean which told us what the average observation for all regions was.
And then we further look at GDP, population and GDP per capita in our five countries and what results it gives us when we calculate all the NUTS3 regions together.
| GDP_per_Capita |
|---|
| 2.24e+04 |
From the summary we can see that GDP per capita at NUTS3 level for all our countries was 22 438,88 Euro. Since we don’t have any other values to compare with we can’t say to much about it, because it doesn´t give us to much information. We will therefor look at the nuts3 regions for each country.
## Region Year GDP unit
## Length:3327 Min. :2000 Min. : 683.7 Length:3327
## Class :character 1st Qu.:2005 1st Qu.: 3873.0 Class :character
## Mode :character Median :2010 Median : 8012.6 Mode :character
## Mean :2010 Mean : 15668.1
## 3rd Qu.:2015 3rd Qu.: 18416.7
## Max. :2020 Max. :251623.6
## Population gdp_per_capita
## Min. : 39583 Min. : 2976
## 1st Qu.: 261286 1st Qu.: 14754
## Median : 428922 Median : 22260
## Mean : 585736 Mean : 22439
## 3rd Qu.: 709348 3rd Qu.: 26560
## Max. :2863272 Max. :116235
It wasn’t easy to draw any conclusion from the summary above. We can see that there is a difference in GDP as the lowest value is 683,7 while the highest is 251 623,6. We can also see that the maximum value is high compared to the other values, as the 3rd quartile is only 18 416,7. Population and gdp_ per_capita gave us the same results, with a very low minimum and a high maximum value.
To get a better picture of the different countries and the opportunity to see if there are regions that stand out. We make summaries per country, because then we have the opportunity to exclude regions that stand out in futher assignments. We will also look at a spesific year for all the country to compare results with.
To get the result by country, we made a summary of GDP per capita on each country. By creating such a summary for each country, we can get an overview of whether there are major inequality within the various regions. If we find such deviations, we can choose to remove some of our regions in order not to have large inequalities.
Since Hungary and Slovakia have few nuts3 regions we chose to look at them together.
Denmark is a small country in Scandinavia,and they don’t have many NUTS3 regions. Kobenhagen is the capital of Denmark, and as we can see above it stand out together with the surrounding region. we can se that the region around the capital has had a higher growth. Capital cities are often richer than other regions. we can see that in Denmark as well.
To get an overview, we have chosen to look at 2010 for all the regions in Denmark to see which regions are the richest and least wealties.
The whealtiest regions in Denmark
| Year | Region | gdp_per_capita |
|---|---|---|
| 2010 | DK011 | 6.47e+04 |
| 2010 | DK012 | 6.38e+04 |
| 2010 | DK032 | 4.27e+04 |
The poorest regions in Denmark
| Year | Region | gdp_per_capita |
|---|---|---|
| 2010 | DK022 | 2.93e+04 |
| 2010 | DK014 | 2.97e+04 |
| 2010 | DK021 | 3.15e+04 |
## Year Region gdp_per_capita
## Min. :2010 Length:11 Min. :29276
## 1st Qu.:2010 Class :character 1st Qu.:32793
## Median :2010 Mode :character Median :36862
## Mean :2010 Mean :40570
## 3rd Qu.:2010 3rd Qu.:41504
## Max. :2010 Max. :64695
The three Whealties regions in Denmark are Kobenhagen (DK011), Kobenhagen area (DK012) and Sydjylland (DK032). The three least wealtiest regions in Denmark are Vest- og Sydjælland (DK022), Bornholm (DK014) and Østsjælland (DK021).
Denmark has 11 nuts3 regions, it is the capital and the surrounding region that stands out among the richest, in the poor countries it is more even, we can see that there is 30,000 Euro between the richest and poorest.
## Population gdp_per_capita
## Min. : 73851 Min. : 8292
## 1st Qu.: 290938 1st Qu.: 21721
## Median : 523771 Median : 24361
## Mean : 643505 Mean : 26286
## 3rd Qu.: 818596 3rd Qu.: 27871
## Max. :2606234 Max. :116235
France is a country in Western Europe, they also have colonies in other parts of the world. France has the largest land areas in the EU, which we see in the fact that they have over a hundred nuts3 regions.
Since France has so many nuts3 regions, it is not so easy to distinguish the different regions. We see that here too the capital Paris stands out. When we made a summary, we see that there is a difference in min and max. When we look more closely at the numbers, we can see that this may be due to France having colonies in other countries that are included. These colonies are located in Africa and South America which has a negative effect on France’s overall GDP. For further research, we have chosen to remove these regions from the data set.
The whealtiest regions in France
| Year | Region | gdp_per_capita |
|---|---|---|
| 2010 | FR105 | 9.24e+04 |
| 2010 | FR101 | 8.76e+04 |
| 2010 | FRK26 | 4.1e+04 |
The poorest regions in France
| Year | Region | gdp_per_capita |
|---|---|---|
| 2010 | FRI22 | 1.82e+04 |
| 2010 | FRJ21 | 1.88e+04 |
| 2010 | FRF32 | 1.93e+04 |
## Year Region gdp_per_capita
## Min. :2010 Length:96 Min. :18154
## 1st Qu.:2010 Class :character 1st Qu.:21951
## Median :2010 Mode :character Median :24289
## Mean :2010 Mean :26566
## 3rd Qu.:2010 3rd Qu.:28190
## Max. :2010 Max. :92362
France now have 96 nuts3 regions. The three wealtiest regions are Paris (FR101), Hauts-de-seine (FR105) and Rhône (FRK26). Paris og Hauts-de-seine stands out clarly from the other region with a difference of approximately 50,000 Euro GDP pr capita. The three poorest regions Creuse (FRI22), Ariège (FRJ21) and Meuse(FRF32). These regions have quite the same GDP pr capita.
Hungary and Slovakia are both countries in Central Europe, here we see that the capitals Bratislava and Budapest both had great growth until 2008. Then Budabest went down a little while Bratislava had a larger increase. the rest of the regions have had an more steady growth.
The whealtiest regions in Hungary and Slovakia
| Year | Region | gdp_per_capita |
|---|---|---|
| 2010 | SK010 | 3.27e+04 |
| 2010 | HU110 | 2.19e+04 |
| 2010 | SK021 | 1.42e+04 |
The poorest regions in Hungary and Slovakia
| Year | Region | gdp_per_capita |
|---|---|---|
| 2010 | HU313 | 4.39e+03 |
| 2010 | HU323 | 5.38e+03 |
| 2010 | HU332 | 5.72e+03 |
## Year Region gdp_per_capita
## Min. :2010 Length:28 Min. : 4395
## 1st Qu.:2010 Class :character 1st Qu.: 6629
## Median :2010 Mode :character Median : 8018
## Mean :2010 Mean : 9561
## 3rd Qu.:2010 3rd Qu.:10106
## Max. :2010 Max. :32670
Hungary and Slovakia has 28 regions together. The tre whelties are Bratislava (SK010), Budapest (HU110) and Trnava (SK021). The capitals of both countries have the richest region. The least wheltiest regions are Nògràd(HU313), Szabolcs-Szatmàr-Bereg(HU323). We can see that the three poorest regions belong to Hungary. which means that they have a lower GDP per capita than Slovakia in the poorest regions. We see that there is a difference between min and max in the regions of Hungary.
There is a difference of approximately 25,000 Euro between the whelties and poorest regions in Hungary. Based on these observations, Bratislava is the richest region, since we do not see any of Slovakia’s regions among the lowest, we can assume that Slovakia has a higher GDP per capita than Hungary has for its inhabitants.
Portugal is a country in southern Europe. Portugal also has two archipelagos, each representing a different region. We see steady growth in all the regions, all have had a slight decline between 2010-2012, after that there has been an increase. We also see that in Portugal the capital Lisbon stands out as the richest region.
The whealtiest regions in Portugal
| Region | gdp_per_capita |
|---|---|
| PT170 | 2.41e+04 |
| PT181 | 2.14e+04 |
| PT150 | 1.7e+04 |
The poorest regions in Portugal
| Region | gdp_per_capita |
|---|---|
| PT11C | 9.92e+03 |
| PT16J | 1.05e+04 |
| PT11B | 1.08e+04 |
## Region gdp_per_capita
## Length:25 Min. : 9919
## Class :character 1st Qu.:12449
## Mode :character Median :14708
## Mean :14558
## 3rd Qu.:15762
## Max. :24120
Portugal has 25 regions. The wheltiest are Lisboa(PT170), Alentejo Litoral (PT181) and Algarve (PT150). We can se that Lisboa has a litle higher GDP pr capita then the rest of the wheltiest regions. The poorest regions are Tãmega e Sousa (PT11C), Alto Tãmega (PT11B) and Beiras e Serra da Estrela (PT16J). There is approximatly 14,000 Euro in GDP per capita, in difference between the wealties and the poorest regions.
In all the countries, we see that it is the capitals that stand out the most and have the highest GDP per capita among their inhabitants. Paris has managed to have the highest GDP per capita and the poorest regions can be found in Hungary.
Furthermore, we will use the data to calculate population waterproof GDP Ginie coefficients for our countries at a NUTS2 () level.
A Gini coefficient must be between 0 and 1. If it’s 0, it means that there is little inequality, and if it’s closer to 1, it means that there is a greater degree of inequality between rich and poor. We calculate a Gini coefficient by looking at how much wealth and income there is in a country and then how it is distributed among the population. when we have calculated the gini coefficients, we will also run a test on the data we have to see if we find outliers. Outliers are values that are either very high or very low compared to the other data we have.
## [1] 0.2846063
| Variable | N | Mean | Std. Dev. | Min | Pctl. 25 | Pctl. 75 | Max |
|---|---|---|---|---|---|---|---|
| gini_n2 | 46 | 0.057 | 0.046 | 0 | 0.03 | 0.073 | 0.261 |
First, we look at all the regions in the selected countries. We have 46 observations. The total Gini for all countries for all years is 0.28. This Gini is for all 5 countries over the last 20 years. We think that it will be a bit “washed away” and will therefore look at each individual country. We can also see that we have ginis that are 0 which means a perfect correlation. We will look at each country to find out why and where there are regions that have ginis that are 0. To look more specifically at the countries, we have chosen to only look at the year 2010.
| Year | nuts2 | gini_n2 |
|---|---|---|
| 2010 | DK01 | 0.114 |
| 2010 | DK02 | 0.0153 |
| 2010 | DK03 | 0.053 |
| 2010 | DK04 | 0.00978 |
| 2010 | DK05 | 0 |
Outliers in Denmark
| nuts2 |
|---|
| DK05 |
In the graph above, you can see that there are two regions that have varied quite a bit over the past 20 years. They still stay below 0.025, which shows that there is little difference between rich and poor in these regions. Another region that stands out is DK01. It is a bit further up the graph than the other regions. Although it is not close to 0, there is a greater difference between rich and poor here than in the other regions.
In Denmark, the Gini coefficients are between 0 and 0,11. Denmark has a region Nordjylland (DK05) which is an outlier. In this region, no data has been recorded in 2010. Denmark has only five NUTS2 regions, which doesn’t gives us much data to work with.
| Year | nuts2 | gini_n2 |
|---|---|---|
| 2010 | FR10 | 0.261 |
| 2010 | FRB0 | 0.0624 |
| 2010 | FRC1 | 0.0728 |
| 2010 | FRC2 | 0.0722 |
| 2010 | FRD1 | 0.0717 |
| 2010 | FRD2 | 0.0567 |
| 2010 | FRE1 | 0.0521 |
| 2010 | FRE2 | 0.0318 |
| 2010 | FRF1 | 0.0452 |
| 2010 | FRF2 | 0.0836 |
| 2010 | FRF3 | 0.0373 |
| 2010 | FRG0 | 0.052 |
| 2010 | FRH0 | 0.0677 |
| 2010 | FRI1 | 0.0813 |
| 2010 | FRI2 | 0.0342 |
| 2010 | FRI3 | 0.0239 |
| 2010 | FRJ1 | 0.0707 |
| 2010 | FRJ2 | 0.13 |
| 2010 | FRK1 | 0.0611 |
| 2010 | FRK2 | 0.118 |
| 2010 | FRL0 | 0.068 |
| 2010 | FRM0 | 0.0493 |
Outliers in France
| nuts2 |
|---|
In France, there are so many regions that it is difficult to see the regions properly in the graph above. What is shown well is that the vast majority of regions follow each other evenly by being below 0.1. There are still some that stand out and we can see that FR10 is the highest with a gini of 0.3, while FRJ1 and FRK1 fluctuate quite a bit from 2005 to 2020
In France, the Gini coefficients are between 0,02 and 0,26, which shows us that France has a slightly higher gini than Denmark, which means that the inequality is slightly greater in France.France doesn´t have any outliers, after we took away the FRY regions,
| Year | nuts2 | gini_n2 |
|---|---|---|
| 2010 | HU11 | 0 |
| 2010 | HU12 | 0 |
| 2010 | HU21 | 0.0686 |
| 2010 | HU22 | 0.086 |
| 2010 | HU23 | 0.0299 |
| 2010 | HU31 | 0.065 |
| 2010 | HU32 | 0.0764 |
| 2010 | HU33 | 0.0515 |
| Year | nuts2 | gini_n2 |
|---|---|---|
| 2010 | SK01 | 0 |
| 2010 | SK02 | 0.0712 |
| 2010 | SK03 | 0.054 |
| 2010 | SK04 | 0.0814 |
Outliers in Hungary and Slovakia
| nuts2 |
|---|
| HU11 |
| HU12 |
| SK01 |
Hungary and Slovakia have large fluctuations in their regions. One of the regions with the most fluctuations is HU22 where we can see that they are down to a gini of 0.075 in 2012, while in 2016 they are up to a gini of approximately 0.12. Hungary and Slovakia are small countries in Eastern Europe and we assume that this is the reason why there are large fluctuations.
In Hungary, the Gini coefficient are between 0 and 0,08. Hungary has two regions which are outliers Budapest (HU11) and Pest (HU12). Pest and Budapest hasn’t had any data for the periode that we are looking into.
In Slovakia,the Gini coefficient are also between 0 and 0,08. Slovakia has on region that is an outliers which are (SK01). It leaves only Slovakia with three regions from which we obtain data.
| Year | nuts2 | gini_n2 |
|---|---|---|
| 2010 | PT11 | 0.0849 |
| 2010 | PT15 | 0 |
| 2010 | PT16 | 0.0671 |
| 2010 | PT17 | 0 |
| 2010 | PT18 | 0.0747 |
| 2010 | PT20 | 0 |
| 2010 | PT30 | 0 |
Outliers in Portugal
| nuts2 |
|---|
| PT15 |
| PT17 |
| PT20 |
| PT30 |
In the graph for Portugal, we can see that the Gini in several of the regions has been declining over the past 20 years. That is to say, the differences between rich and poor have narrowed over the years. PT18 stands out somewhat in that there are strong fluctuations over the years.
In Portugal, the Gini coefficient are between 0 and 0,08. Portugal has four regions, Algarve (PT15) Lisboa(PT17), Regiäo Autònoma dos Acores (PT20) and Regiäo Autònoma da Madeira (PT30). Regiäo Autònoma dos Acores and Regiäo Autònoma da Madeira are both archipelagos belonging to Portugal which may be the reason why they are outliers. Portugal doesn’t have many NUTS 2 regions, and when four of them have no value, there is not much confidence in the result we get.
To summarize what has been done in assignment 1, we have calculated GDP per capita for all the countries combined and per country we have been given. We saw that when we collected all the countries we got 22,805.13 Euros in GDP per capita. Denmark had a significantly higher GDP per capita than the other countries. Hungary had the lowest with only 8781.69. France has colonies in other continents, we chose to remove these regions, this so that GDP would not be affected by these regions that belong to others continents.
When we look at outliers for our countries, we can not find any outliers in France, but we do find in all the other countries. All the regions that are outliners are regions that have only one province, in Hungary, Slovakia and Portugal (HU11, HU12, SK01, PT15 and PT 17) the outliers are linked to the capitals. Capitals are often large areas, which only have one region. Portugal also has two island groups (PT20 and PT30) that come up as outliers, these are small regions. The last region that has outliners can be found in Denmark (DK05), this is a small region.
At the second assigment we are looking at growth and inequity. We are going to estimate the effect if regional development on regional inequality, for the year 2010. Then we will disuse the goodness of fit of our estimated model. We will plot the relationship between regional development and regional inequality and the fitted line corresponding to our estimate. We are also going to plot the residuals against the predicted values of our model. There will be a discussion about the classical assumptions OLS in light of our data and plots and other determinants of inequity.
We will also go back on Eurostat´s webpages and download EurostatLinks to an external site. It will be for our subset of countries regional (NUTS2, j) data related to transport infrastructure, education and demographics. We are suppose to select on variable per category that we would like to explore further in there relationship to regional inequality. We will try to estimate a multiple linear regression model with our new variables for 2010 and give a small interpretation of our findings. In the end we will discuss the overall fit of our model and the inference related to our findings.
We will start the assignment 2 with getting the data set from Eurostat. We want to look at the amount of people who have higher education in the education data set and how many motorways there is in kilometers when we look in the transport dataset. In the demographic data set we want to look at the life expectancy age.
Further in the assignment, we will look at growth and inequity in the countries at Nuts2 level.
Before we go further we want to make new variables for the data set. Moving forward we will make linear models (lm) and a form of regression that is simple. A simple regression model will show us the relationship between two variables (R for everyone s. 265). By using this model we can find the Y value when X = 0.
The gini value goes from 0 to 1. Where 0 is a perfect equality and 1 is unequality. As we can see in the summary there are gini…. When the gini is 0, it is likely to believe there is missing som data for 2010. Further we will use filter to take away the gini´s who are zero.
In this ggplot, we can see how the Gini is distributed per country. We can see that France has a point that stands out from the others, while the other countries have most of their points between 0 and 0.1. What we have to be observant about is that the countries Denmark, Hungary, Portugal and Slovakia don’t have so many observations after we divided the countries into Nuts2 levels, that´s why we choose to look only look at France by itself and the alle the countries when we move forward in the assignment.
To estimate the the effecte between a regional development and regional inequality we can use the formula:
\[ Regional inequality_i = \beta_1 + \beta_2Regional development_i+u_i\]
This equation tells us what the regional inequality will be when the regional development is = 0 The slope of the curve will show us how much inequality will change for each increase in development.
## `geom_smooth()` using formula 'y ~ x'
In this ggplot we look at the relationship between Gini and GDP per capita in 2010 for all countries. We can see that very few of the points hit the line when we look at all the countries together.
| Denmark | France | Hungary_Slovakia | Portugal | Total | |
|---|---|---|---|---|---|
| gdp_per_capita | 0.405 | 0.758 *** | 0.293 | -0.396 | 0.125 * |
| (0.138) | (0.081) | (0.299) | (0.885) | (0.056) | |
| const. | -11806.326 | -13096.883 *** | 4054.608 | 13294.565 | 3983.460 ** |
| (5781.918) | (2225.178) | (2550.306) | (12829.967) | (1429.894) | |
| N | 4 | 22 | 9 | 3 | 38 |
| R2 | 0.813 | 0.814 | 0.121 | 0.167 | 0.122 |
| Note: *** p < 0.001; ** p < 0.01; * p < 0.05 T statistics in brackets. | |||||
When we look at the table above we can see the impact regional development (X) has on regional inequality (Y). This means how much an increase in X will mean for an increase in Y. Portugal is the only country that has a negative impact. We will take a closer look at this country by country. From this and earlier observations we choose to remove Paris from the France2010 dataset.
When we look at the simple regression that we have done, it is R2 that can help us explain whether the variables have any relationship with each other. R2 tells us how much spread we have in the independent variable. The value in R2 can be between 0 and 1. When R2 is 1 it tells us that the independent variable has all the influence on the dependent variable. If it is 0, the independent variable has no influence on the dependent variable (Forskningsmetode - s.345). We can see that Denmark and France have a high R2 value, Denmark with 0,813 and France with 0,814. Denmark only has 4 observations, which tells us that we cannot completely trust this result. France, on the other hand, has 22 observations, the more observations the better outcome of the result . When we look at all the countries together, we can see that both R2 is only 0.122, we can see that in the context of……
## `geom_smooth()` using formula 'y ~ x'
When we have carried out a regression analysis, we get a line that gives us an overview of where the best hits are made. Residuals show the distance to the best fit line. There can be both positive and negative residuals. If the value is above the line it is positive and if it is below the line it is negative.
When we look at all the countries together, we can see that the regression line has both positive and negative hits on the line.
As we said earlier we are now only looking at France because it has 22 observations and are more trustworthy then the others. We are also looking at all countries together. When we look at the graph for all the countries and at the one for France, we can see in both graphs that there are some of France´s regions that stands out.The best is when the points are on the line, but as we see above most of the regions are above or below the line which means there is a large variation between the regions. In the graph, we can also see that France has a much steeper curve than when we look at all the countries together. With a steeper curve, one must have more GDP to increase the Gini.
For linear regression there are seven OLS assumptions that are classical. To produce the the best estimates we usually use the first six assumptions.
Assumption 1: The regression model is linear in the coefficients and the error term
When we look at all the countries it is close to linear, but when we look at France who have the most nuts2 regions we can see that
Assumption 2:The error term has a population mean of zero
We can see that there are variations in the X variable both in France and in all the countries combined. which may indicate that we have fulfilled the second requirement for OLS Assumption 3:All independent variables are uncorrelated with the error term
The third assumption is about having a random dataset, even if we have made changes such as removing regions in France, we will still say that we have a random dataset based on the population.
Assumption 4:Observations of the error term are uncorrelated with each other
Assumption 5:The error term has a constant variance (no heteroscedasticity)
When we look at the graph for France the line is not flat at all, which means it is heteroscedasticity.
Assumption 6:The error term is normally distributed (optional)
```{r normal q-q plot qqnorm(ols) qqline(ols)
#ols won´t work `` When we look at the normal Q-Q plot it doesn´t appear to show normal values.
https://www.datasciencecentral.com/7-classical-assumptions-of-ordinary-least-squares-ols-linear/
We have chosen to look at km, education and life expectancy. We want to see what the new variables have to say on the Gini.
In a multiple linear regression model, we use several variables to see the effect of an increase in gini (Y). We can use the formula:
\[ Y_i = \beta_0 + \beta_1X_{1i} + \beta_2X_{2i}+... + \beta_kX_{ki}+u_i \]
| Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | |
|---|---|---|---|---|---|---|
| gdp_per_capita | 0.613 | 0.495 | -1.028 | 0.416 | 0.712 | 0.501 |
| (0.520) | (0.205) | (NaN) | (0.107) | (NaN) | (NaN) | |
| Education | 544.974 | -5620.249 | 773.104 | |||
| (1279.489) | (NaN) | (NaN) | ||||
| Motorway | 20.253 | 160.693 | 19.432 | |||
| (29.146) | (NaN) | (NaN) | ||||
| Lifeexp | -2527.220 | -2821.860 | -2480.632 | |||
| (1652.517) | (NaN) | (NaN) | ||||
| const. | -34060.858 | -20252.333 | 150689.366 | 187990.794 | 179714.043 | 176204.288 |
| (52787.904) | (13885.965) | (NaN) | (130721.394) | (NaN) | (NaN) | |
| N | 4 | 4 | 4 | 4 | 4 | 4 |
| R2 | 0.841 | 0.874 | 1.000 | 0.944 | 1.000 | 1.000 |
| Note: *** p < 0.001; ** p < 0.01; * p < 0.05 T statistics in brackets. | ||||||
In Denmark, we can see that models 1, 2 and 4 all have a negative effect when it comes to the variables being dependent on each other. in model 5, where we look at education and life expectancy, we can see a positive effect. the same in model 6 with motorway and life expectancy. when we look at R2, this is high on model 3. but since we only have 4 observations from Denmark, it is not possible to draw any conclusion as to whether these variables are relevant to each other
| Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | |
|---|---|---|---|---|---|---|
| gdp_per_capita | 0.610 * | 0.514 | 0.460 | 0.502 | 0.487 | 0.344 |
| (0.245) | (0.291) | (0.297) | (0.268) | (0.273) | (0.314) | |
| Education | -99.006 | -92.916 | -64.507 | |||
| (95.904) | (96.604) | (101.650) | ||||
| Motorway | 2.004 | 1.867 | 1.942 | |||
| (2.045) | (2.055) | (2.008) | ||||
| Lifeexp | 838.106 | 693.520 | 823.480 | |||
| (632.397) | (682.321) | (633.720) | ||||
| const. | -6263.825 | -7875.226 | -3535.765 | -74859.196 | -60697.052 | -70598.569 |
| (7685.735) | (6935.886) | (8286.314) | (48460.099) | (54101.928) | (48747.192) | |
| N | 21 | 21 | 21 | 21 | 21 | 21 |
| R2 | 0.344 | 0.340 | 0.374 | 0.367 | 0.381 | 0.400 |
| Note: *** p < 0.001; ** p < 0.01; * p < 0.05 T statistics in brackets. | ||||||
France is the country with the most observations. Here we can see that both models 2,4 and 6 have positive effects on the Gini. But all R2 are quite low so we cannot say with certainty that these variables have any effect on the Gini
| Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | |
|---|---|---|---|---|---|---|
| gdp_per_capita | 0.432 | 0.209 | 0.889 | 0.399 | 0.463 | 0.403 |
| (0.551) | (0.308) | (0.519) | (0.700) | (0.807) | (0.700) | |
| Education | 56.488 | 317.996 | 51.957 | |||
| (182.446) | (205.217) | (214.170) | ||||
| Motorway | -8.937 | -19.707 | -9.369 | |||
| (8.544) | (10.367) | (9.368) | ||||
| Lifeexp | -319.675 | -128.476 | -596.659 | |||
| (1873.970) | (2187.748) | (1894.324) | ||||
| const. | 1919.114 | 6237.527 | -3153.585 | 27039.266 | 11327.858 | 49242.688 |
| (7418.924) | (3282.168) | (6742.626) | (134766.511) | (160421.754) | (136582.962) | |
| N | 9 | 9 | 9 | 9 | 9 | 9 |
| R2 | 0.134 | 0.256 | 0.497 | 0.125 | 0.135 | 0.271 |
| Note: *** p < 0.001; ** p < 0.01; * p < 0.05 T statistics in brackets. | ||||||
Again we have chosen to merge Hungary and Slovakia when we are going to do this multiple regression analysis. here we can see that models 4 and 5 have a positive effect on Gini. Model 5, which has education and life expectancy as variables, both values are positive but they have a rather low R2, which means that they do not have such a strong correlation anyway.
| Model 1 | Model 2 | Model 3 | |
|---|---|---|---|
| gdp_per_capita | 6.898 | -3.723 | 6.898 |
| (NaN) | (NaN) | (NaN) | |
| Education | 3978.251 | 3978.251 | |
| (NaN) | (NaN) | ||
| Lifeexp | -4768.800 | ||
| (NaN) | |||
| const. | -379852.307 | 443293.670 | -379852.307 |
| (NaN) | (NaN) | (NaN) | |
| N | 3 | 3 | 3 |
| R2 | 1.000 | 1.000 | 1.000 |
| Note: *** p < 0.001; ** p < 0.01; * p < 0.05 T statistics in brackets. | |||
Portugal had no observations on the Motorway, we have therefore removed this variable from these observations. we are then left with three models. and we can see that all the models are positive, model 1 has a higher R2 than model 2 has. while model 3 has a perfect 1. but since we only have 3 observations we cannot put much faith in this R2.
France is the only country with many observations in 2010, and it will probably give us the best basis for looking at whether the model fits our model or not. We want to find out whether education and the number of km have an effect on inequalities in France and whether these variables have a connection to how wealthy the various regions are.
BNP2010 %>%
filter(Year==2010 & nuts0=="DK") %>%
ggplot(aes(x =edu, y=Gini, fill=id_nuts2, color=id_nuts2)) +
geom_point(lwd = .8) +
labs(x = "Education", y = "Gini")
lm(Gini2 ~ Edu, data=France2010)%>%
tidy%>%
kable(., digits=2)
Lecture
```{r
reg <- lm(gdp_n2 ~ lea + gdp_per_capita + Edu, data = gdppop2010)
coeftest(reg, )
reg2 <- lm(log(gdp_n2) ~ log(lea) + log(gdp_per_capita) + log(Edu), data = gdppop2010)
coefficients(reg,vcov=hccm )
#### Appendix
Plots the residuals against the predicted values of our model
geom_smooth() using formula ‘y ~ x’
<img src="Assignment-MSB104_files/figure-html/unnamed-chunk-26-1.png" width="672" />
geom_smooth() using formula ‘y ~ x’
<img src="Assignment-MSB104_files/figure-html/unnamed-chunk-27-1.png" width="672" />
geom_smooth() using formula ‘y ~ x’```